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Daubechies wavelets are a powerful systematic basis set for electronic structure calculations be- 
cause they are orthogonal and localized both in real and Fourier space. We describe in detail 
how this basis set can be used to obtain a highly efficient and accurate method for density func- 
tional electronic structure calculations. An implementation of this method is available in the 
ABINIT free software package. This code shows high systematic convergence properties, very 
good performances and an excellent efficiency for parallel calculations. 



PACS numbers: 

I. INTRODUCTION 

In recent years, the Kohn-Sham formalism of the den- 
sity functional theory (DFT) approach has proven to be 
one of the most efficient and reliable first-principle meth- 
ods for investigating material properties and processes 
that exhibit quantum mechanical behavior. The high ac- 
curacy of the results together with the relatively simple 
form of the exchange-correlation functionals make this 
method arguably the most powerful tool for ab initio 
simulations of the properties of matter. The computa- 
tional machinery of DFT calculations has been widely 
developed in the last decade, giving rise to a plethora of 
DFT codes. The use of DFT calculation has thus become 
more and more common, and its domain of application 
includes solid state physics, chemistry, material science, 
biology and geology. 

One of the most important characteristics of a DFT 
code is the set of basis functions used to express the 
Kohn-Sham (KS) orbitals. The domain of applicability 
of a code is tightly connected to this choice. For example, 
a non-localized basis set like plane waves is highly suit- 
able for electronic structure calculations of periodic and 
homogeneous systems, while it is much less efficient in ex- 
panding localized functions, which have a wider range of 
components in reciprocal space. For these reasons DFT 
codes based on plane waves are not well suited to sim- 
ulate inhomogeneous or isolated systems like molecules, 
due to the high memory requirements for such kind of 
simulations. 

A strong distiction should also be made between codes 
that use systematic and non-systematic basis sets. A sys- 
tematic basis set allows us to calculate the solution of the 
KS equations with arbitrarily high precision as the num- 
ber of basis functions is increased. In other terms, the 
numerical precision of the results is related to the num- 
ber of basis functions used to expand the KS orbitals. 
With such a basis set it is thus possible to obtain results 
that are free of errors related to the choice of the basis. 



eliminating a source of uncertainty. This is particularly 
important in view of the fact that highly accurate approx- 
imations to the exchange correlation functional are now 
available such as the PBE functional ([T}. Some of these 
functionals also contain van der Waals interactions (j2}. 
A systematic basis set allows us to accurately calculate 
the solution of a particular exchange correlation func- 
tional. On the other hand, non-systematic basis sets, 
for example gaussians, often become over complete and 
numerical instabilities arise before absolute convergence 
can be achieved. Such basis sets are more difficult to use, 
since the basis set must be carefully tuned by hand by 
the user, which will sometimes require some preliminary 
knowledge of the system under investigation. This is the 
most important weakness of this popular basis set. 

Another property which has a role in the performances 
of a DFT code is the orthogonality of the basis set. The 
use of nonorthogonal basis sets requires the calculation 
of the overlap matrix of the basis functions and to per- 
form various operations with this overlap matrix such as 
inverting the matrix, by iterative or non-iterative meth- 
ods. This makes methods based on non-orthogonal sys- 
tematic basis functions not only more complicated but 
also slower. 

Daubechies wavelets ^ have virtually all the proper- 
ties that one might desire of a basis set being used for the 
simulation of isolated or inhomogeneous systems. They 
form a systematic orthogonal and smooth basis, localized 
both in real and Fourier space and that allows for adap- 
tivity. A DFT approach based on such functions will 
meet both the requirements of precision and localization 
found in many applications. In this paper, we will de- 
scribe in detail a DFT method based on a Daubechies 
wavelets basis set. This method is implemented in a 
DFT code, distributed under GNU-GPL license and in- 
tegrated in the ABINIT software package. A separate, 
standalone version of this code is also available and dis- 
tributed under GNU-GPL license ([5]). In the next few 
paragraphs we will discuss the importance of the prop- 
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erties of Daubechies wavelets in the context of electronic 
structure calculations. 

A wavelet basis consists of a family of functions gen- 
erated from a mother function and its translations on 
the points of a uniform grid of spacing h. The number 
of basis functions is increased by decreasing the value of 
h. Thanks to the systematicity of the basis, this will 
make the numerical description more precise. The de- 
gree of smoothness determines the speed with which one 
converges to the exact result as h is decreased. The de- 
gree of smoothness increases as one goes to higher order 
Daubechies wavelets. In our method we use Daubechies 
wavelets of order 16. This together with the fact that 
our method is quasi variational gives a convergence rate 
of h^"^. Obtaining such a high convergence rate is es- 
sential in the context of electronic structure calculations 
where one needs highly accurate results for basis sets 
of acceptable size. The combination of adaptivity and 
a high order convergence rate is typically not achieved 
in other electronic structure programs using systematic 
real space methods (|6j). An adaptive finite element code, 
using cubic polynomial shape functions (|7|) , has a conver- 
gence rate of h^. Finite difference methods have some- 
times low dD or high convergence rates ([Qj) but are 
not adaptive. 

As discussed above, localization in real space is essen- 
tial for molecular systems. Basis sets that are not lo- 
calized in real space are wasteful in this context. For 
instance, with plane waves one has to fill an orthorhom- 
bic cell into which the molecule fits. Large subregions of 
the cell may contain no atoms and therefore no charge 
density, but this feature can not be exploited with plane 
waves. Since Daubechies wavelets have a compact sup- 
port, one can consistently define a set of localization pa- 
rameters which allows us to put the basis functions only 
on the points which are sufficiently close to the atoms. 
The computational volume in our method is thus given 
only by the union of spheres centered on all the atoms 
in the system. Real space localization is also necessary 
for the implementation of linear scaling algorithms ()T0)| . 
This basis set is thus a promising candidate for develop- 
ing such algorithms. 

Localization in Fourier space is useful for precondition- 
ing purposes. For a given system, the convergence rate of 
the minimization process depends on the highest eigen- 
value of the Hamiltonian operator. Since the high fre- 
quency spectrum of the Hamiltonian is dominated by the 
kinetic energy operator, high kinetic energy basis func- 
tions are therefore also approximate eigenfunctions of the 
Hamiltonian. A function localized in Fourier space is an 
approximate eigenfunction of the kinetic energy operator. 
By using such functions as basis functions for the KS or- 
bitals the high energy spectrum of the Hamiltonian can 
thus easily be preconditioned. 

A high degree of adaptivity is necessary for all-electron 
calculations since highly localized core electrons require 
a much higher spatial resolution than the valence wave- 
function away from the atomic core. High adaptivity 



can in principle be obtained with a wavelet basis and 
wavelet based all-electron electronic structure programs 
have been developed (flTl IT2|l . In contrast to these devel- 
opments we use pseudopotentials since such pseudopo- 
tentials are the easiest way to incorporate the relativis- 
tic effects that are important for heavy elements. The 
use of pseudopotentials drastically reduces the need for 
adaptivity and we have therefore only two levels of adap- 
tivity. We have a high resolution region that contains 
all the chemical bonds and a low resolution region fur- 
ther away from the atoms where the wavefunctions decay 
exponentially to zero. In the low resolution region each 
grid point carries a single basis function. In the high res- 
olution region it carries in addition 7 wavelets. In terms 
of degrees of freedom, the high resolution region is thus 
8 times denser than the low resolution region. In com- 
parison with a plane wave methods our wavelet method 
is therefore particularly efficient for open structures with 
large empty spaces and a relatively small bonding region. 

The outline of this paper is as follows: in the 
next section we describe the fundamental properties of 
Daubechies wavelets. Then we will describe how the var- 
ious operations needed in an electronic structure calcu- 
lations are done in a scaling function/ wavelet basis. The 
last part of the paper illustrates the performances of our 
DFT code based on Daubechies wavelets. 

II. ADAPTIVITY IN A WAVELET BASIS 

There are two fundamental functions in wavelet the- 
ory (121 fTB)l . the scaling function (t){x) and the wavelet 
ip{x). 

The most important property of these functions is that 
they satisfy the so-called refinement equations 

m 

cj){x)^V2 J2 h,c^i2x-j) (1) 

j = l-m 

which establishes a relation between the scaling functions 
on a grid with grid spacing h and another one with spac- 
ing h/2. hj and gj = {—ly h_j^i are the elements of 
a filter that characterizes the wavelet family, and m is 
the order of the scaling function-wavelet family. All the 
properties of these functions can be obtained from the 
relations (jlj. The full basis set can be obtained from all 
translations by a certain grid spacing h of the mother 
function centered at the origin. The mother function is 
localized, with compact support. The maximally sym- 
metric Daubechies scaling function and wavelet of order 
16 that are used in this work are shown in Fig. [T] 

For a three-dimensional description, the simplest basis 
set is obtained by a set of products of equally spaced 
scaling functions on a grid of grid spacing h' 

0,,,, (r) = (l)ix/h' - z) </)(y//i' - j) <p{z/h' - k) . (2) 
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FIG. 1 Daubechies scaling function (j) and wavelet tp of order 
16. Both are different from zero only in the interval from -7 
to 8. 



In other terms, the three-dimensional basis functions are 
a tensor product of one dimensional basis functions. Note 
that we are using a cubic grid, where the grid spacing is 
the same in all directions, but the following description 
can be straightforwardly applied to general orthorombic 
grids. 

The basis set of Eq. [2] is equivalent to a mixed basis 
set of scaling functions on a twice coarser grid of grid 
spacing h — 2h' 



wavelets. In this region the resolution is thus doubled 
in each spatial dimension compared to the low resolution 
region. Fig. |2] shows the 2-level adaptive grid around a 
water molecule. 




FIG. 2 A 2-level adaptive grid around a II2O molecule. The 
high resolution grid points carrying both scaling functions and 
wavelets are shown in blue (larger points), the low resolution 
grid points carrying only a single scaling function are shown 
in yellow (smaller points). 



0jj,fe(r) = (j){x/h - i) (j){ylh - j) (j){z/h - k) (3) 
augmented by a set of 7 wavelets 
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This equivalence follows from the fact that, from Eq. ([T|), 
every scaling function and wavelet on a coarse grid of 
spacing h can be expressed as a linear combination of 
scaling functions at the fine grid level h' and vice versa. 

The points of the simulation grid fall into 3 different 
classes. The points which are very far from the atoms will 
have virtually zero charge density and thus will not carry 
any basis functions. The remaining grid points are either 
in the high resolution region which contains the chemi- 
cal bonds or in the low resolution regions which contains 
the exponentially decaying tails of the wavefunctions. In 
the low resolution region one uses only one scaling func- 
tion per coarse grid point, whereas in the high resolu- 
tion region one uses both the scaling function and the 7 



A wavefunction ^'(r) can thus be expanded in this ba- 
sis: 

'il,«2,i3 

jl,j2,j3 V = l 

The sum over ii, 12, 13 runs over all the grid points con- 
tained in the low resolution region and the sum over ji, 
J2i is over all the points contained in the smaller high 
resolution region. 

The decomposition of scaling function into coarser scal- 
ing functions and wavelets can be continued recursively 
to obtain more than 2 resolution levels. We found how- 
ever that a high degree of adaptivity is not of paramount 
importance in pseudopotential calculations. In other 
terms, the pseudopotentials smooth the wavefunctions so 
that two levels of resolution are enough in most cases to 
achieve good computational accuracy. In addition, more 
than two resolution levels lead to more complicated algo- 
rithms such as the non-standard operator form (|T4|) that, 
in turn, lead to larger prefactors. 

The transformation from a pure fine scaling function 
representation (a basis set which contains only scaling 
functions centered on a finer grid of spacing h!) to a 
mixed coarse scaling function/ wavelet representation is 
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done by the fast wavelet transformation (il3j) which is a 
convolution and scales linearly with respect to the num- 
ber of basis functions being transformed. 

The wavefunctions are stored in a compressed form 
where only the nonzero scaling function and wavelets co- 
efficients are stored. The basis set being orthogonal, sev- 
eral operations such as scalar products among different 
orbitals and between orbitals and the projectors of the 
non-local pseudopotential can directly be done in this 
compressed form. In the following sections we will illus- 
trate the main operations which must be performed in 
the context of a DFT calculation. 



ions these potentials have this form: 
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III. OVERVIEW OF THE METHOD 

In the KS formulation of DFT, the electronic density 
of a system of N electrons can be calculated from the 
square modulus of a set of wavefunctions: 

N/2 

p{v) = Y,nit\^,{v)\' , (6) 

i=l 

where the KS wavefunctions \^>i) are eigenfunctions of 
the KS Hamiltonian, with pseudopotential Vpsp- 

(^-^V^ -t- VksIp] + ^psp) I*.) = £.1*^) ■ (7) 



where Ygm are the spherical harmonics, and rioc, ri are, 
respectively, the localization radius of the local pseudopo- 
tential term and of each projector. 

The analytic form of the pseudopotentials together 
with the fact that their expression in real space can be 
written in terms of a linear combination of tensor prod- 
ucts of one dimensional functions is of great utility in our 
method. 

Each term in the Hamiltonian is implemented differ- 
ently, and will be illustrated in the following sections. Af- 
ter the application of the Hamiltonian, the KS wavefunc- 
tions are updated via a direct minimisation scheme ('18J , 
which in its actual implementation is fast and reliable 
for non-zero gap systems, namely insulators. At present 
we have concentrated on systems with a gap, however we 
see no reason why the method can not be extended to 
metallic systems. 



For the sake of simplicity we assume in this description 
that our electronic system is a closed-shell system of non- 
spin-polarised electronic orbitals. For this reasons we 
have exactly N/2 KS wavefunctions and Vi nocc = 2. 

The KS potential 

Vks [p] = Vh [p] + Kc [p] + 14xt , (8) 

contains the Hartree potential, solution of the Poisson's 
equation 'V^Vh = — 47rp, the exchange-correlation po- 
tential Kcc and the external ionic potential Vext acting 
on the electrons. The method we illustrate in this paper 
is conceived for isolated systems, namely free boundary 
conditions. 

In our method, we choose the pseudopotential term 
Vpsp to be of the form of norm-conserving GTH-HGH 
pseudopotentials (|15l [T5] fTT]! . which have a local and a 
nonlocal term, Vpsp = Viocai + Koniocai- For each of the 



IV. TREATMENT OF KINETIC ENERGY 

The matrix elements of the kinetic energy operator 
among the basis functions of our mixed representation 
(i.e. scaling functions with scaling functions, scaling 
function with wavelets and wavelets with wavelets) can 
be calculated analytically For simplicity, let us il- 

lustrate the application of the kinetic energy operator 
onto a wavefunction ^' that is only expressed in terms of 
scaling functions. 

'^{x,y,z) = ^Si,.i^^,^(j>{x/h-ii)(t){y/h-i2)(t){z/h-i3) 

The result of the application of the kinetic energy opera- 
tor on this wavefunction, projected to the original scaling 
function space, has the expansion coefficients 

Sii,i2,i3 ^ J (t){x/h-ii)ct){y/h-i2)(l){z/h-i3) X 

xA"^{x,y,z) dxdydz 
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Analytically the coefficients Sii,i2.«3 ^^'^ ^iiM,^ ^'^^ 
lated by a convolution 



- T K - 



where 



and 



h J2 ,i3 



dx(j){x/h — ii) d1.4>{x/h) . 



(11) 



(12) 



(13) 



Using the refinement equation Q, the values of the Ti 
can be calculated analytically, from a suitable eigenvector 
of a matrix derived from the wavelet filters ((1^1) . For this 
reason the expression of the kinetic energy operator is 
exact in a given Daubechies basis. 
Since the 3-dimensional kinetic energy filter Ki^ i^ is 



a product of three one-dimensional filters (Eq. 12) the 



convolution in Eq. 11 can be evaluated with 3N1N2N3L 
operations for a three-dimensional grid of N1N2N3 grid 
points. L is the length of the one-dimensional filter which 
is 29 for our Daubechies family. The kinetic energy can 
thus be evaluated with linear scaling with respect to 
the number of nonvanishing expansion coefficients of the 
wavefunction. This statement remains true for a mbced 
scaling function-wavelet basis where we have both non- 
vanishing s and d coefficients and for the case where the 
low and high resolution regions cover only parts of the 
cube of N1N2N3 grid points. 

The Daubechies wavefunctions of degree 16 have an 
approximation error of h^, i.e. the difference between 
the exact wavefunction and its representation in a finite 
basis set (Eq. [s]) is decreasing as h^. The error of the 
kinetic energy in a variational scheme decreases then as 
^2-8-2 _ ^14 (piJ[| T^jii ggg ^Y^Q kinetic energy 

is limiting the convergence rate in our scheme and the 
overall convergence rate is thus h^^. Figure [s] shows this 
asymptotic convergence rate. 



V. TREATMENT OF LOCAL POTENTIAL ENERGY 

In spite of the striking advantages of Daubechies 
wavelets the initial exploration of this basis set ([2l]l did 
not lead to any algorithm that would be useful for real 
electronic structure calculations. This was due to the fact 
that an accurate evaluation of the local potential energy 
is difficult in a Daubechies wavelet basis. 

By definition, the local potential V{r) can be easily 
known on the nodes of the uniform grid of the simulation 
box. Approximating a potential energy matrix element 
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FIG. 3 Convergence rate 0(h^'^) of the wavelet code for a 
test run on a carbon atom. For this run the interpolation 
parameters are found to be, within 2% accuracy: A — 344, 
B = —1239, C — 1139. Other test systems gave comparable 
convergence rates. 



by 



i,j,k;i',j',k' ~ ^ </'i',j',fe'(r;,m,n)^(ri,m,n)0ij-,fc(r/,m,n) 
Lm,n 



gives an extremely slow convergence rate with respect to 
the number of grid points used to approximate the inte- 
gral because a single scaling function is not very smooth, 
i.e. it has a rather low number of continuous deriva- 
tives. A. Neelov and S. Goedecker have shown that 
one should not try to approximate a single matrix ele- 
ment as accurately as possible but that one should try 
instead to approximate directly the expectation value 
of the local potential. The reason for this strategy is 
that the wavefunction expressed in the Daubechy basis 
is smoother than a single Daubechies basis function. A 
single Daubechies scaling function of order 16 has only 
4 continuous derivatives. By suitable linear combina- 
tions of Daubechies 16 one can however exactly repre- 
sent polynomials up to degree 7, i.e functions that have 
7 non- vanishing continuous derivatives. The discontinu- 
ities get thus canceled by taking suitable linear combi- 
nations. Since we use pseudopotentials, our exact wave- 
functions are analytic and can locally be represented by 
a Taylor series. We are thus approximating functions 
that are approximately polynomials of order 7 and the 
discontinuities nearly cancel. 

Instead of calculating the exact matrix elements we 
therefore use matrix elements with respect to a smoothed 
version ip of the Daubechies scaling functions. 



dr(^i/j/,fc/(r)F(r)0^j,fe(r) 



l,m,n 

<p0,Qfi{'l'^l-i' .m-j' ,n-k')V{ri,m,n)(i>0,0,o{^l-i,m-j,n-k) 

l,m,n 

(14) 
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where the smoothed wave function is defined by 

and (jJi is the "magic filter" . The relation between the 
true functional values, i.e. the scaling function, and w is 
shown in figure|4] Even though Eq.[T4]is not a particulary 
good approximation for a single matrix element it gives 
an excellent approximation for the expectation values of 
the local potential energy 



dx dy dz'if{x,y,z)V{x,y,z)'i'{x,y,z) 



and also for matrix elements between different wavefunc- 
tions 



dx dy dz-^i{x,y,z)V{x,y,z)-^ j{x,y,z) 



in case they are needed. In practice we do not explicitly 
calculate any matrix elements but we apply only filters to 
the wavefunction expansion coefficients as will be shown 
in the following. This is mathematically equivalent but 
numerically much more efficient. 




FIG. 4 The magic filter Ui for the least asymmetric 
Daubechies-16 basis. 



Since the operations with the local potential V are per- 
formed in the computational box on the double resolu- 
tion grid with grid spacing h' — h/2, we must perform a 
wavelet transformation before applying the magic filters. 
These two operations can be combined in one, giving rise 
to modified magic filters both for scaling functions and 
wavelets on the original grid of spacing h. These modi- 
fied magic filters can be obtained from the original ones 
using the refinement relations and they are shown in Fig- 
ures|5]and|6] Following the same guidelines as the kinetic 
energy filters, the smoothed real space values ^i,j,k of a 
wavefunction ^ are calculated by performing a product of 
three one-dimensional convolutions with the magic filters 
along the x, y and z directions. For the scaling function 
part of the wavefunction the corresponding formula is 



(1) 



(1) 



(1) 



where is the filter that maps a scaling function on a 
double resolution grid. Similar convolutions are needed 
for the wavelet part. The calculation is thus similar to 
the treatment of the Laplacian in the kinetic energy. 
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FIG. 5 The fine scale magic filter u'^' (combination of a 
wavelet transform and the magic filter in figure H for the 
least asymmetric Daubechies-16 basis, scaled by v2 for com- 
parison with the scaling function. The values of the filter on 
the graph are almost undistinguishable from the values of the 
scaling function. However, there is a slight difference which 
is important for the correct asymptotic convergence at small 
values of grid spacing h. 
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FIG. 6 The fine scale magic filter v^^^ (combination of a 
wavelet transform and the magic filter in figureB| for the least 
asymmetric Daubechies-16 wavelet, scaled by v2 for compar- 
ison with the wavelet itself. 



Once we have calculated ^ij,k the approximate expec- 
tation value ev of the local potential V for a wavefunc- 
tion ^' is obtained by simple summation on the double 
resolution real space grid: 



jl J2 J3 



^ il ,i2 ,i3 ^1 ,i2 J3 ^il J2 J3 
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The evaluation of the local potential energy ey con- 
verges with a convergence rate of h^^ to the exact value 
where h is the grid spacing. Therefore, the potential en- 
ergy has a convergence rate two powers of h faster than 
the rate for the kinetic energy. 



VI. CALCULATION OF HARTREE POTENTIAL 

We saw in the section on the treatment of the local 
potential energy how to express efficiently the point val- 
ues of the smoothed wavefunction ^' on the fine grid 
mesh. From these values the charge density on a grid 
point ji , j2 , js of the double resolution grid is given by 



(15) 



where riocc are the occupation numbers. For a closed shell 
system they equal 2 for the occupied orbitals and zero for 
all other orbitals. The discrete charge density j2 is 
a very good approximation to the charge distribution of 
the continuous wavefunctions |^) in the sense that the 
first multipoles of the discrete charge distribution con- 
verge rapidly to the values of the continuous charge dis- 
tribution. The monopole converges with a rate of h^^. 
For each higher multipole moment the convergence rate 
is reduced by one power of h, i.e. dipoles converge with 
a rate of h^^, quadrupoles with h^'^, etc. The discrete 
charge density p on the double resolution grid is then 
the input to various Poisson solvers that are available 
for different boundary conditions. In the case of free 
boundary conditions, appropriate for isolated molecules, 
the values Pji,j2,j3 form the coefficients for an expansion 
in interpolating scaling functions of order 16. This ex- 
pansion strictly conserves all the multipoles up to the 
angular moment £ — 15 and allows to solve the inte- 
gral equation for the potential explicitly with the correct 
boundary conditions (|23p. In addition to free boundary 
conditions we have also implemented surface boundary 
conditions i.e. periodicity in 2 directions and free 

boundary conditions in the third direction. In this case 
the charge density is represented in a mixed plane wave- 
scaling function representation. 

These Poisson solvers have a convergence rate of ft.'™, 
where m is the order of the interpolating scaling func- 
tions used to express the Poisson kernel. Since we use 
interpolating scaling functions of order 16 the conver- 
gence rate of the electrostatic potential is faster than the 
rate for the kinetic energy. All these Poisson Solvers have 
one thing in common, they perform explicitly the convo- 
lution of the density with the Green's functions of the 
Poisson's equation. The necessary convolutions are done 
by a traditional zero-padded FFT procedure which leads 
to an 0{N log N) operation count with respect to the 
number of grid points N. The accuracy of the potential 
is uniform over the whole volume and one can thus use 
the smallest possible volume compatible with the require- 
ment that the tails of the wavefunctions have decayed to 



very small values at the surface of this volume. The frac- 
tion of the computational time needed for the solution of 
the Poisson's equation decreases with increasing system 



size and is roughly 1% for large systems, see section XVI 



Moreover, the explicit Green's function treatment of the 
Poisson's solver allows us to treat isolated systems with a 
net charge directly without the insertion of compensating 
charges. 



VII. XC FUNCTIONALS AND IMPLEMENTATION OF 
GGA'S 

The charge density expression used to calculate the 
Hartree potential 



P(r) = E"oil*.(r) 



(16) 



is also used for the calculation of the exchange correlation 
energy E-^c and the corresponding potential V^c- Any 
real-space based implementation of the XC functionals 
fits well with this density representation. In our program 
we use the XC functionals as implemented in ABINIT 
code. To this aim, we use the same ABINIT XC routines 
to calculate the exchange correlation energy 



(17) 



-Bxc = J p(r)exc(r)dr 
together with the XC potential 



^xc(r) = 



Spiv) 



(18) 



Also spin-polarised (coUinear) version of the ABINIT XC 
functionals can be used with our method. 

In the case of GGA exchange-correlation functionals 
the XC energy density depends both on the local values 
of the charge density p and on the modulus of its gradient: 



,(r) = exe(p(r),|Vp|(r)) 



(19) 



A traditional finite difference scheme of fourth order is 
used on the double resolution grid to calculate the gra- 
dient of the charge density 



dwPi^il ,12,13 ) 



E(t) 
'^n,«2,i3;Jl,j2,j3^Jl J2 J3 ' 

jl ,j2 ,j3 



(20) 



where w — x,y, z. For grid points close to the boundary 
of the computational volume the above formula requires 
grid points outside the volume. For free boundary con- 
ditions the values of the charge density outside the com- 
putational volume in a given direction are taken to be 
equal to the value at the border of the grid. 

The relation between the gradient and the density 
must be taken into account when calculating V^c in the 
standard White-Bird approach (25)1 . where the density 
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gradient is considered as an explicit functional of the den- 
sity. There the XC potential can be split in two terms: 



where 



Kc(r.,.,„.3) = K°cW + KiW 



(21) 



(22) 



|vp| d\vp\ 



(w) 



where the "ordinary" part is present in the same form 
of LDA functionals, while the White-Bird "correction" 
term V^^ appears only when the XC energy depends ex- 
plicitly on |Vp|. The c^™) are the coefficients of the finite 
difference formula used to calculate the gradient of the 
charge density ( 20 1 . 



The evaluation of the XC terms and also, when needed, 
the calculation of the gradient of the charge density, may 
easily be performed together with the Poisson solver used 
to evaluate the Hartree potential. This allows us to save 
computational time. 



VIII. TREATMENT OF THE NON-LOCAL 
PSEUDOPOTENTIAL 

The energy contributions from the non-local pseudopo- 
tential have for each angular moment I the form 



where \pi) is a pseudopotential projector. Once applying 
the hamiltonian operator, the application of one projec- 
tor on the wavefunctions requires the calculation of 



If we use for the projectors the representation of Eq. [5] 
(i.e. the same as for the wavefunctions) both operations 
are trivial to perform. Because of the orthogonality of the 
basis set we just have to calculate scalar products among 
the coefficient vectors and to update the wavefunctions. 
The scaling function and wavelet expansion coefficients 
for the projectors are given by (13)) 



p{v) 0.,,.„,3(r)dr , / p{v) C„,„.3(r)dr . (23) 



where we used the notation (|3| , (|4 1 . 

The GTH-HGH pseudopotentials $151 US]) have projec- 
tors which are written in terms of gaussians times polyno- 
mials. This form of projectors is particularly convenient 



to be expanded in the Daubechies basis. In other terms, 
since the general form of the projector is 



the 3-dimensional integrals can be calculated easily 
since they can be factorized into a product of 3 one- 
dimensional integrals. 



(24) 



+00 



e-"^ f,p{t/h - j)dt (25) 



The one-dimensional integrals are calculated in the 
following way. We first calculate the scaling function 
expansion coefficients for scaling functions on a one- 
dimensional grid that is 16 times denser. The integration 
on this dense grid is done by the well-known quadra- 
ture introduced in (j26p, that coincides with the magic 
filter ((22| . This integration scheme based on the magic 
filter has a convergence rate of h^^ and we gain there- 
fore a factor of 16^^ in accuracy by going to a denser 
grid. This means that the expansion coefficients are for 
reasonable grid spacings h accurate to machine preci- 
sion. After having obtained the expansion coefficients 
with respect to the fine scaling functions we obtain the 
expansion coefficients with respect to the scaling func- 
tions and wavelets on the required resolution level by 
one-dimensional fast wavelet transformations. No accu- 
racy is lost in the wavelet transforms and our represen- 
tation of the projectors is therefore typically accurate to 
nearly machine precision. 



IX. PRECONDITIONING 

As already mentioned, direct minimisation of the total 
energy is used to find the converged wavefunctions. The 
gradient gi of the total energy with respect to the i-th 
wavefunction is given by 



(26) 



where A^j — {ipjlHlipi) are the Lagrange multipliers en- 
forcing the orthogonality constraints. Convergence is 
achieved when the average norm of the residue {gilgi)^^"^ 
is below an user-defined numerical tolerance. 

Given the gradient direction at each step, several al- 
gorithms can be used to improve convergence. In our 
method we use either preconditioned steepest-descent al- 
gorithm or preconditioned DIIS method (|27l l28jl . These 
methods work very well to improve the convergence for 
non-zero gap systems if a good preconditioner is avail- 
able. 

The preconditioning gradient \gi) which approximately 
points in the direction of the minimum is obtained by 
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solving the linear system of equations obtained by dis- 
cretizing the equation 



1. 



(27) 



The values are approximate eigenvalues obtained by 
a subspace diagonalization in a minimal basis of atomic 
pseudopotential orbitals during the generation of the in- 
put guess. For isolated systems, the values of the for 
the occupied states are always negative, therefore the op- 
erator of Eq. (27 1 is positive definite. 

Eq. ( p7| is solved by a preconditioned conjugate gradi- 
ent (CG) method. The preconditioning is done by using 
the diagonal elements of the matrix representing the op- 
erator I — e-i in a scaling function- wavelet basis. In the 
initial step we use £ resolution levels of wavelets where £ 
is typically 4. To do this we have to enlarge the domain 
where the scaling function part of the gradient is defined 
to a grid that is a multiple of 2^. This means that the 
preconditioned gradient tji will also exist in a domain that 
is larger than the domain of the wavefunction Never- 
theless this approach is useful since it allows us to obtain 
rapidly a preconditioned gradient that has the correct 
overall shape. In the following iterations of the conju- 
gate gradient we use only one wavelet level in addition to 
the scaling functions for preconditioning. In this way we 
can do the preconditioning exactly in the domain of ba- 
sis functions that are used to represent the wavefunctions 
(Eq. |5]). A typical number of CG iterations necessary to 
obtain a meaningful preconditioned gradient is 5. 



X. ORTHOGONALIZATION 

We saw the need of keeping the wavefunctions or- 
thonormal at each step of the minimisation loop. This 
means that the overlap matrix S, with matrix elements 



(28) 



must be equal to the identity matrix. 

All the orthogonalization algorithms have a cubic com- 
plexity causing this part of the program to dominate for 
large systems, see Fig. 
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We therefore optimized this 
part carefully and found that a pseudo-Gram-Schmidt al- 
gorithm that uses a Cholesky factorization of the overlap 
matrix S is the most efficient method on parallel com- 
puters. In the following, we discuss the reasons for this 
choice by comparing it to two other orthogonalization al- 
gorithms: classical Gram-Schmidt and Loewdin orthog- 
onalizations. 



each orbital. The overlap of the currently processed or- 
bital l^*,;) with the set of the already processed orbitals 



is calculated and is removed from 

"^i) is normalized. 



'3 = 1,-- ,1-1 

Thereafter, the transformed orbital 



I*, 



(29) 



(30) 



The algorithm consists of the calculation of n{n + l)/2 
scalar products and wavefunction updates. If the coeffi- 
cients of each orbital are distributed among several pro- 
cessors n(n + l)/2 communication steps are needed to 
sum up the various contributions from each processor to 
each scalar product. Such a large number of communica- 
tion steps leads to a large latency overhead on a parallel 
computer and therefore to poor performances. 



B. Loewdin orthogonalization 

The Loewdin orthonormalization algorithm is based on 
the following equation: 



(31) 



where a new set of orthonormal orbitals j^fi) is obtained 
by multiplying the inverse square-root of the overlap ma- 
trix S with the original orbital set. 

The implementation of this algorithm requires that the 
overlap matrix S is calculated. As 5 is a symmetric ma- 
trix, we need to calculate only a triangle of the origi- 
nal matrix which results in n{n + l)/2 scalar products. 
In contrast to the classical Gram-Schmidt algorithm the 
matrix elements Sij depend on the original set of orbitals 
and can be calculated in parallel in the case where each 
processor holds a certain subset of the coefficients of each 
wavefunction. At the end of this calculation a single com- 
munication step is needed to sum up the entire overlap 
matrix out of the contributions to each matrix element 
calculated by the different processors. Thereafter, the 
inverse square-root of S is calculated. For this, we use 
the fact that S is an hermitian positive definite matrix. 
Thus, there exist a unitary matrix U which diagonalizes 
S = U*AU, where A is a diagonal matrix with positive 
eigenvalues. Consequently, S'"' = U^A~2U. Hence, an 
eigenvalue problem must be solved in order to find U and 
A. 



A. Gram-Schmidt orthogonalization 

The classical Gram-Schmidt orthonormalization algo- 
rithm generates an orthogonal set of orbital {l^'i)} out 
of a non-orthogonal set {|^i)}, by processing separately 



C. Pseudo Gram-Schmidt using Cholesky Factorization 

In this scheme a Cholesky factorization of the overlap 
matrix S = LL"'" is calculated. The new orthonormal 
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orbitals are obtained by 



(32) 



and are equivalent to the orbitals obtained by the clas- 
sical Gram-Schmidt. The procedure for calculating the 
overlap matrix out of the contributions calculated by each 
processor is identical to the Loewdin case. Instead of 
solving an eigenvalue problem we have however to calcu- 
late the decomposition of the overlap matrix. This can 
be done much faster. Thus, this algorithm has a lower 
pre-factor than the Loewdin scheme and requires only 
one communication step on a parallel computer. 



XI. CALCULATION OF FORCES 

Atomic forces can be calculated with the same method 
used for the application of the hamiltonian onto a wave- 
function. Since the scaling function/ wavelet basis is not 
moving together with atoms, we have no Pulay forces ([29]) 
and atomic forces can be evaluated directly through the 
Feynman-Hellmann theorem. Except for the force aris- 
ing from the trivial ion-ion interaction, which for the i-th 
atom is 



(ionic) 



E 



(33) 



the energy terms which depend explicitly on the atom 
positions are related to the pseudopotentials. As shown 
in the previous sections, the GTH-HGH pseudopotentials 
we are using are based on separable functions (|15l ITH)l . 
and can be splitted into a local and a non-local contribu- 
tion. 

For an atom i placed at position R^, the contribution 
to the energy that comes from the local part of the pseu- 
dopotential is 



-Blocal(R-») = j '^^ Mocal(|r - Ri|)p(r) • 



(34) 



Where the local pseudopotential can be split into long 
and a short-ranged terms Viocai(0 = + ^s(0; 



^^L(0 = -yCrf 



Vs{l) = exp 



+ C3 



2rj 

4 



C1 + C2 



(35) 



re 



where the Ci and ri are the pseudopotential parameters, 
depending on the atom of atomic number Zi under con- 
sideration. The energy contribution £'iocai(Ri) can be 



rewritten in an equivalent form. It is straightforward to 
verify that 

£^iocai(Rz)= J dr pi^{\r-RSVH{r) 

+ J drV5(|r-R,|)p(r) , (36) 

where Vh is the Hartree potential, and is such that 
VrVidr-Ril) = -47rpL(|r-Ri|). This analytical trans- 
formation remains also valid in our procedure for solving 
the discretized Poisson's equation. From equation (36 1 
we can calculate 



PLil) 



1 



(27r)3/2 



7 I'' 

e ' 



(37) 



which is a localized (thus short-ranged) function. The 
forces coming from the local pseudopotential are thus 



, (local) 



aR, 

rt] r-RJ 



p'^(|r-R,|)FH(r) 



F^(|r-R,|)p(r) 



(38) 



where 



1 Z,r 



(27r)3/2 r4 



loc 



I 



VsH) - -e 



(2C2-Ci) + (4C3-C2) 



re 



+ (6C4~C3) 



-C4 



(39) 



Within this formulation, the contribution to the forces 
from the local part of pseudopotential is written in terms 
of integrals with localized functions (gaussians times 
polynomials) times the charge density and the Hartree 
potential. This allows us to perform the integrals only 
in a relatively small region around the atom position 
and to assign different integrations to different proces- 
sors. Moreover, the calculation is performed with almost 
linear {0{N log N)) scaling. 

The contribution to the energy that comes from the 
nonlocal part of the pseudopotential is, as we saw in sec- 
tion |VlTT] 

i?„o„local(R») =EE<*IP™(^^))^™„(Pn(R.)l*) , 

; mn 

. ^^^^ 

where we wrote explicitly the dependence of the projector 
on the atom position R^. The contribution of this term 
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to the atomic forces is thus 

I 7n,n * 

-^(vI/b(R,))/i„„(^^|vI/). (41) 

Expressing the derivatives of the projectors in the 
Daubechies basis, the evaluation of the scalar products 
is straightforward. The scahng functions - wavelets ex- 
pansion coefficients of the projector derivatives can be 
calculated with machine precision accuracy in the same 
way as the projectors themselves were calculated. This 
is due to the fact that the derivative of the projectors are 
like the projectors themselves products of gaussians and 
polynomials. 

XII. LOCALIZATION PROPERTIES AND SMOOTHNESS 
OF THE BASIS FUNCTIONS 

As discussed above, Daubechies basis functions are 
suitable for expanding localised functions. There is no 
need to put basis functions on grid points that are far 
from the atoms. For this reason, we choose to associate 
the basis functions to points lying inside the union of 
atom-centered spheres defined by their radii. This op- 
eration must be performed both for the high and low 
resolution grid points (see Figure [2|. In our method, we 
measure these radii in two different units. For the high 
resolution region the radius is expressed in terms of the 
shortest localisation radius of the atom pseudopotential. 
For the low resolution region, the distance is expressed 
in units of the asymptotic decaying length of the atomic 
wavefunction 1 /\/2eHOMOi calculated from the energy 
ehomo of the highest occupied atomic orbital, obtained 
from pop. In this way we can easily determine nearly 
optimal sizes for the high and low resolution regions and 
minimize the number of degrees of freedom to achieve a 
target accuracy (Section [XVI I . 

We saw that Daubechies wavelets have the property 
that linear combinations of them can be smoother than 
a single Daubechies scaling function or wavelet. The 
wavefunction of Eq. [5] is thus typically smoother than 
the scaling functions and wavelets used to represent it. 
The reduced smoothness of Daubechies scaling function 
of order 16 in the tail region can be seen from Fig. [7] The 
cancellation of discontinuities in the basis set by suitable 
linear combinations is only possible in an infinite interval 
where several basis functions are present between any two 
grid points. Since we use a finite grid of scaling functions 
in the tail region, the number of scaling functions that 
contribute to the value of the wavefunction at a certain 
point is dropping as we are going out of the computa- 
tional volume. The outermost intervals of the wavefunc- 
tion are actually only described by the tail of a single 
scaling function. Hence the wavefunction is getting less 
smooth towards its end. This reduced smoothness af- 
fects principally the kinetic energy. For systems without 



a net charge, far from the atoms the potential is very 
small and for this reason errors in the potential energy 
are decreasing exponentially with respect to the size of 
the computational volume. 

XIII. PERTURBATIVE CALCULATION OF THE FINITE 
SIZE CORRECTIONS 

Far from the atoms, each wavefunctions decays ex- 
ponentially with a decay rate which depends on its KS 
eigenvalue pTjl . If A is the amplitude of the tail of the 
wavefunction, the kinetic energy contribution of the non- 
smooth wavefunction in its tail region is of the order of 
A/h"^, whereas the exact wavefunction has a kinetic en- 
ergy of the order of A. As a consequence the kinetic 
energy error increases as one decreases h and the total 
energy increases as well if the computational volume is 
too small. We know, however, that the contribution to 
the kinetic energy in this region will depend uniquely on 
the asymptotic behaviour of the wavefunction, which is 
governed by its KS eigenvalue. In other terms, the mag- 
nitude of the kinetic energy error due to the localisation 
of the system in a finite volume can, in principle, be esti- 
mated by knowing the KS eigenvalue of the wavefunction. 

If, on the other hand the computational volume is 
large enough such that the amplitude A is very small 
our method shows a strict variational behaviour with a 
convergence rate of ft.^^ over a large range of grid spacings 
h. This is illustrated in Fig. [3] 
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FIG. 7 Zoom of the Daubechies scaling function near the 
border of its support. Both the function and its absolute 
value are plotted. 



The above described facts prompted us to develop a 
method that cuts off the wave function tail at a very 
large radius but which is computationally much less ex- 
pensive than a fully selfconsistent calculation in a very 
large computational volume. We do first a fully selfcon- 
sistent calculation in a medium size box and we add then 
afterwards the missing far tail to the wavefunction. Let 
us denote the wavefunction that we have calculated in 
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the medium size box by j^*) and the wavefunction in the 
very large box by \'^) + lA^*). As we will see \A'^) is 
negligible inside the medium size box. It is essentially 
the tail outside the original medium size box plus a part 
that cancels the non-smooth behaviour in the surface re- 
gion of the medium size box. Evidently \'^) + \A'^) has 
to satisfy the Schrodinger equation 



1. 



V2 -t- V{r) m + |A^)) = e(l^) + |A^)) 



Rearranging the term one obtains 



V{r) - e] \A^) = - [l-W' + V{v) - e] \^) 



(42) 

The term on the right hand side of the above equation 
is the gradient \g) that is needed in any minimization 
scheme. When the calculation of the wave functions is 
converged the gradient is zero (actually less than a small 
numerical tolerance) when projected onto the subspace 
of the basis functions spanning the medium size volume. 
The gradient is, however, not anymore zero when it is 
projected onto the basis set of the larger volume. In this 
case the projection onto the basis function just outside 
the medium size volume gives a nonzero contribution. 
Remember, that the fact that these basis functions are 
missing in the basis set of the medium size volume is 
causing the non- smooth behaviour. Projection on basis 
functions that are far outside the surface region of the 
medium size volume are again zero since |^) is identically 
zero. So, in this context, the gradient is a quantity that is 
nonzero only in a small shell outside the original medium 
size volume. The width of this shell is given by the length 
of the kinetic energy filter. Since the potential is very 
small in the tail region Eq. |42] can be approximated by 



1. 



V'-e lAvf) = |5) . 



As usual in a perturbative treatment we rely on the fact 
that the eigenvalues e converge faster than the wavefunc- 
tion and the zeroth order eigenvalues can therefore be 
used for the first order correction to the wavefunction. 
The above equation is identical to the preconditioning 
equation Eq.[27]and can be solved with the same method, 
just within a larger volume. In this way we can eliminate 
in a single preconditioning step at the end of the fully self- 
consistent calculation in the medium size volume a large 
fraction of the error arising from cutting off the wave- 
functions at the surface of our computational volume. 
We can thus have a reliable estimation of the approxi- 
mation resulting from the restriction of the system to a 
finite computational volume. Fig. [8] shows an example 
of the convergence rate of the total energy with respect 
to the size of the computational volume both with and 
without tail correction for two different grid spacings. 



10"' 



10'^ 



10"^ 



Ordinary wavefunctions minimisation 
With perturbative tail corrections 



ft = 0.35bolir 




3.5 4 4.5 5 5.5 6 6.5 7 

Localisation radius of coarse region (arb. units) 

FIG. 8 Absolute convergence of the total energy of a methane 
molecule as a function of the low resolution localization ra- 
dius with and without the tail corrections. The curves for 
two different values of the grid spacing are plotted, showing 
the h convergence for the localization parameter sufficiently 
extended. 



XIV. PARALLELIZATION 

Two data distribution schemes are used in the paral- 
lel version of our program. In the orbital distribution 
scheme, each processor works on one or a few orbitals 
for which it holds all its scaling function and wavelet 
coefficients. In the coefficient distribution scheme each 
processor holds a certain subset of the coefficients of all 
the orbitals. Most of the operations such as applying 
the Hamiltonian on the orbitals, and the precondition- 
ing is done in the orbital distribution scheme. This has 
the advantage that we do not have to parallelize these 
routines and we therefore achieve almost perfect paral- 
lel speedup. The calculation of the Lagrange multipliers 
that enforce the orthogonality constraints onto the gra- 
dient as well as the orthogonalization of the orbitals is 
done in the coefficient distribution scheme. For the or- 
thogonalization we have to calculate the matrix (^'jj^'i) 
and for the Lagrange multipliers the matrix (^j 
So each matrix element is a scalar product and each pro- 
cessor is calculating the contribution to this scalar prod- 
uct from the coefficients it is holding. A global reduc- 
tion sum is then used to sum the contributions to obtain 
the correct matrix. Such sums can esily be performed 
with the very well optimized BLAS-LAPACK libraries. 
Switch back and forth between the orbital distribution 
scheme and the coefficient distribution scheme is done by 
the MPI global transposition routine MPI_ALLTOALL. 
For parallel computers where the cross sectional band- 
width ()32j scales well with the number of processors this 
global transposition does not require a lot of CPU time. 
The most time consuming communication is the global 
reduction sum required to obtain the total charge distri- 
bution from the partial charge distribution of the indi- 
vidual orbital (sum in Eq. 151. 
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XV. CALCULATION OF UNOCCUPIED ORBITALS 

In order to calculate the unoccupied Kohn Sham or- 
bitals we use the Davidson method after having 

found the selfconsistent occupied Kohn Sham orbitals. 
An initial guess for the iVvirt unoccupied eigenvectors 
and eigenvalues of the Kohn Sham Hamiltonian i?KS 
is obtained from the subspace diagonalization in a min- 
imal atomic basis set that is also used to generate the 
input guess for the occupied orbitals. For any given set 
of virtual orbitals we calculate then the gradients (Eq. |26] 
where the Lagrange multipliers ensure only orthogonal- 
ity to the occupied orbitals) and precondition then these 
gradients according to Eq. [27] A subspace diagonaliza- 
tion is then done in the space spanned by the present 
set of approximate eigenvectors and their preconditioned 
gradients. In the original Davidson method the dimen- 
sion of the subspace is increased in each iteration since 
one keeps all the previous preconditioned gradients in the 
subspace. To save memory we have limited the dimen- 
sion of the subspace in each iteration to 2N^irt using only 
the present set of approximate eigenvectors together with 
their preconditioned gradients. Even though the number 
of requested unoccupied orbitals is typically small (fre- 
quently only the LUMO), a larger set of vectors N^i^t 
is considered in our method (in a parallel calculation at 
least one per processor), but only the gradients of the 
desired number of orbitals are taken into account for the 
convergence criterion for the norm of the gradients. This, 
together with the fact that our preconditioner is rather 
good allows us to achieve fast convergence rates compara- 
ble to the ones achieved in the calculation of the occupied 
orbitals. Some 20 iterations are typically needed. 



XVI. PERFORMANCE RESULTS 

We have applied our method on different molecular 
systems in order to test its performances. As expected, 
the localization of the basis set allows us to reduce consid- 
erably the number of degrees of freedom (i.e. the num- 
ber of basis functions which must be used) to attain a 
given absolute precision with respect to a plane wave 
code. This fact reduces the memory requirements and 
the number of floating point operations. Figure |9] shows 
the comparison of the absolute precision in a calculation 
of a 44 atom molecule as a function of the number of 
degrees of freedom used for the calculation. In table |l] 
the comparison of the timings of a single SCF cycle with 
respect to two other plane wave based codes are shown. 
Since the system is relatively small the cubic terms do not 
dominate. For large systems of several hundred atoms 
the gain in CPU time compared to a plane wave pro- 
gram is proportional to the reduction in the number of 
degrees of freedom (compare Eq. 



43 1 and can thus be 
very significant as one can conclude from Fig. [9j 

The parallellization scheme of the code has been tested 
and has given the efficiency detailed in Figure [TO] The 



10^ 



10" 



10-^ 



10-" 



E, = 40Ha 



Plane waves 



Wavelets 




lO"" 10° 
Number of degrees of freedom 

FIG. 9 Absolute precision (not precision per atom) as a func- 
tion of the number of degrees of freedom for a cinchonidine 
molecule (44 atoms). Our method is compared with a plane 
wave code. In the case of the plane wave code the plane wave 
cutoff and the volume of the computational box were cho- 
sen such as to obtain the required precision with the smallest 
number of degrees of freedom. In the case of our wavelet 
program the grid spacing h and the localzation radii were op- 
timized. For very high accuracies the exponential convergence 
rate of the plane waves beats the algebraic convergence rate of 
the wavelets. Such high accuracies are however not required 
in practice. Since convolutions can be executed at very high 
speed the wavelet code is faster than the plane wave code at 
any accuracy even if the number of degrees of freedom are 
similar (see table [l|. 
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40 
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TABLE I Computational time in seconds for a single min- 
imization iteration for different runs of the cinchonidine 
molecule used for the plot in figure |9] The timings for dif- 
ferent cutoff energies Ec for the plane waves runs are shown. 
The input parameters for the wavelet runs are chosen such 
as to obtain the same absolute precision of the plane wave 
calculations. The plane wave runs are performed with the 
ABINIT code, which uses iterative diagonalization and with 
CPMD code H34p in direct minimization. These timings are 
taken from a serial run on a 2.4GHz AMD Opteron CPU. 



overall efficiency is always higher than 88%, also for large 
systems with a big number of processors. 

It is also interesting to sec which is the computational 
share of the different sections of the code with respect to 



the total execution time. Figure II shows the percentage 
of the computational time for the different sections of the 
code as a function of the number of orbitals while keeping 
constant the number of orbitals per processor. The differ- 



14 



100 
98 
96 

S- 94 

£Z 

■q 

90 




^\ (with orbitals/proc) 



10 100 
Number of processors 



1000 



FIG. 10 Efficiency of the parallel implementation of the code 
for several runs with different number of atoms. The number 
close to each point indicates the number of orbitals treated 
by each processors, in the orbital distribution scheme. 



ent sections considered are the application of the hamilto- 
nian (kinetic, local plus nonlocal potential), the construc- 
tion of the density (Eq.( 15 1), the Poisson solver for creat- 
ing the Hartree potential, the preconditioning-DIIS, and 
the operations needed for the orthogonality constraint as 
well as the orthogonalization, which are mainly matrix- 
matrix products or matrix decompositions. These op- 
erations are all performed by linear algebra subroutines 
provided by the LAPACK libraries ([55)1 . Also, the percent- 
age of the communication time is shown. While for rel- 
atively small systems the most time-dominating part of 
the code is related to the Poisson solver, for large systems 
the most expensive section is by far the calculation of the 
linear algebra operations. The operations performed in 
this section scales cubically with respect to the number 
of atoms. Apart from the Cholesky factorization, which 
has a scaling of 0{n^j.^), where riorb is the number of 
orbitals, the cubic terms are of the form 



(43) 



where n is the number of degrees of freedom, i.e. the 
number of scaling function and wavelet expansion coef- 
ficients. Both the calculation of the overlap matrix in 
Eq. [28] and the orthogonality transformation of the or- 
bitals in Eq. [32] lead to this scaling. The number of the 
coefficients n is typically much larger than the number 
of orbitals. 



XVII. CONCLUSIONS 

In this paper we have shown the principal features of 
an electronic structure pseudopotential method based on 
Daubechies wavelets. Their properties make this basis 
set a powerful and promising tool for electronic struc- 
ture calculations. The matrix elements, the kinetic en- 
ergy and nonlocal pseudopotentials operators can be cal- 
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FIG. 11 Relative importance of diflerent code sections as a 
function of the number of atoms of a simple alkane chain, 
starting from single carbon atom. The calculation is per- 
formed in parallel such that each processor holds the same 
number of orbitals (two in this figure). Also the time in sec- 
onds for a single minimization iteration is indicated, showing 
the asymptotic cubic scaling of present implementation. 



culated analytically in this basis. The other operations 
are mainly based on convolutions with short-range filters, 
which can be highly optimized in order to obtain good 
computational performances. Our code shows high sys- 
tematic convergence properties, very good performances 
and an excellent efficiency for parallel calculations. This 
code is integrated in the ABINIT software package and 
is freely available under GNU-GPL license. At present, 
several developments are in progress to improve the fea- 
tures of this code. Mainly, they concern the extension 
of this formalism to fully periodic systems and surfaces, 
as well as the inclusion of non-coUinear spin-polarized 
XC functionals. A linear scaling version of this wavelet 
code is also under preparation and will be presented in a 
forthcoming paper. 
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